The detection precision is low when the diatom training sample size is small, so a Multi-scale Multi-head Self-attention (MMS) and Online Hard Example Mining (OHEM) based few-shot diatom detection model, namely MMSOFDD was proposed based on the few-shot object detection model Two-stage Fine-tuning Approach (TFA). Firstly, a Transformer-based feature extraction network Bottleneck Transformer Network-101 (BoTNet-101) was constructed by combining ResNet-101 with a multi-head self-attention mechanism to make full use of the local and global information of diatom images. Then, multi-head self-attention was improved to MMS, which eliminated the limitation of processing single object scale of the original multi-head self-attention. Finally, OHEM was introduced to the model predictor, and the diatoms were identified and localized. Ablation and comparison experiments between the proposed model and other few-shot object detection models were conducted on a self-constructed diatom dataset. Experiment results show that the mean Average Precision (mAP) of MMSOFDD is 69.60%, which is improved by 5.89 percentage points compared with 63.71% of TFA; and compared with 61.60% and 60.90% the few-shot object detection models Meta R-CNN and Few-Shot In Wild (FSIW), the proposed model has the mAP improved by 8.00 percentage points and 8.70 percentage points respectively. Moreover, MMSOFDD can effectively improve the detection precision of the detection model for diatoms with small size of diatom training samples.